The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recently, Vehicle-to-Everything(V2X) cooperative perception has attracted increasing attention. Infrastructure sensors play a critical role in this research field, however, how to find the optimal placement of infrastructure sensors is rarely studied. In this paper, we investigate the problem of infrastructure sensor placement and propose a pipeline that can efficiently and effectively find optimal installation positions for infrastructure sensors in a realistic simulated environment. To better simulate and evaluate LiDAR placement, we establish a Realistic LiDAR Simulation library that can simulate the unique characteristics of different popular LiDARs and produce high-fidelity LiDAR point clouds in the CARLA simulator. Through simulating point cloud data in different LiDAR placements, we can evaluate the perception accuracy of these placements using multiple detection models. Then, we analyze the correlation between the point cloud distribution and perception accuracy by calculating the density and uniformity of regions of interest. Experiments show that the placement of infrastructure LiDAR can heavily affect the accuracy of perception. We also analyze the correlation between perception performance in the region of interest and LiDAR point cloud distribution and validate that density and uniformity can be indicators of performance.
translated by 谷歌翻译
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
translated by 谷歌翻译
本文调查了2D全身人类姿势估计的任务,该任务旨在将整个人体(包括身体,脚,脸部和手)局部定位在整个人体上。我们提出了一种称为Zoomnet的单网络方法,以考虑到完整人体的层次结构,并解决不同身体部位的规模变化。我们进一步提出了一个称为Zoomnas的神经体系结构搜索框架,以促进全身姿势估计的准确性和效率。Zoomnas共同搜索模型体系结构和不同子模块之间的连接,并自动为搜索的子模块分配计算复杂性。为了训练和评估Zoomnas,我们介绍了第一个大型2D人类全身数据集,即可可叶全体V1.0,它注释了133个用于野外图像的关键点。广泛的实验证明了Zoomnas的有效性和可可叶v1.0的重要性。
translated by 谷歌翻译
人们众所周知,与卷积神经网络相比,变压器在语义分割方面的性能更好。然而,最初的视觉变压器可能缺乏当地社区的归纳偏见,并且具有较高的时间复杂性。最近,Swin Transformer通过使用分层体系结构并更有效地改变了窗口,在各种视觉任务中创建了新记录。但是,由于Swin Transformer是专门为图像分类设计的,因此它可能在基于密集的预测分段任务上实现次优性能。此外,仅使用现有方法对SWIN Transformer梳理将导致最终分割模型的模型大小和参数的提升。在本文中,我们重新考虑了Swin Transformer进行语义分割,并设计了一个轻巧但有效的变压器模型,称为SSFormer。在此模型中,考虑到SWIN Transformer的固有层次设计,我们提出了一个解码器来汇总来自不同层的信息,从而获得了局部和全局的注意。实验结果表明,提出的SSFormer与最先进的模型产生了可比的MIOU性能,同时保持较小的模型尺寸和较低的计算。
translated by 谷歌翻译
腹部器官分割具有许多重要的临床应用,例如器官定量,手术计划和疾病诊断。但是,从CT扫描中手动注释器官是耗时且劳动密集型的。半监督的学习表明,通过从大量未标记的图像和有限的标签样本中学习来减轻这一挑战的潜力。在这项工作中,我们遵循自我训练策略,并使用CNN和Transformer使用混合体系结构(PHTRAN),以生成精确的伪标签。之后,我们将标签数据一起介绍给具有轻量级PHTRAN的两阶段分割框架,以提高模型的性能和概括能力,同时保持效率。 Flare2022验证集的实验表明,我们的方法可实现出色的分割性能以及快速和低资源模型的推断。平均DSC和HSD分别为0.8956和0.9316。在我们的开发环境下,平均推理时间为18.62 s,平均最大GPU存储器为1995.04 MB,GPU内存时间曲线下的面积和CPU利用时间曲线下的平均面积为23196.84和319.67。
translated by 谷歌翻译
2D姿势估计的现有作品主要集中在某个类别上,例如人,动物和车辆。但是,有许多应用程序方案需要检测看不见的对象类的姿势/关键点。在本文中,我们介绍了类别不稳定姿势估计(CAPE)的任务,该任务旨在创建一个姿势估计模型,能够检测仅给出一些具有关键点定义的样本的任何类别对象的姿势。为了实现这一目标,我们将姿势估计问题作为关键点匹配问题制定,并设计一个新颖的Cape框架,称为姿势匹配网络(POMNET)。提出了基于变压器的关键点交互模块(KIM),以捕获不同关键点之间的交互以及支持图像和查询图像之间的关系。我们还介绍了多类姿势(MP-100)数据集,该数据集是包含20K实例的100个对象类别的2D姿势数据集,并且经过精心设计用于开发CAPE算法。实验表明,我们的方法的表现优于其他基线方法。代码和数据可在https://github.com/luminxu/pose-for-venthing上找到。
translated by 谷歌翻译
随机梯度下降(SGD)是现代机器学习(ML)系统的基石。尽管具有其计算效率,但SGD仍需要随机数据访问,这些数据访问在依赖块可调地理的二级存储的系统中实现效率低下,例如HDD和SSD,例如TensorFlow/Pytorch和DB ML系统,而不是大文件。为了解决这种阻抗不匹配,已经提出了各种数据改组策略,以平衡SGD的收敛速率(有利于随机性)及其I/O性能(有利于顺序访问)。在本文中,我们首先对现有数据改组策略进行系统的实证研究,该研究表明,所有现有策略都有改进的空间 - 它们都在I/O性能或融合率方面受苦。考虑到这一点,我们提出了一种简单但新颖的分层数据改组策略Corgipile。与现有的策略相比,Corgipile避免了完整的数据洗牌,同时保持SGD的可比收敛速度,就好像执行了完整的混音一样。我们对Corgipile的融合行为提供了非平凡的理论分析。我们通过在新的CorgipileDataSet API中设计新的平行/分布式洗牌操作员来进一步将Corgipile整合到Pytorch中。我们还通过介绍具有优化的三个新的物理运营商,将Corgipile集成到PostgreSQL中。我们的实验结果表明,Corgipile可以与全面的SGD达到可比的收敛速率,以实现深度学习和广义线性模型。对于ImageNet数据集的深度学习模型,Corgipile比带有完整数据洗牌的Pytorch快1.5倍。对于具有线性模型的INDB ML,在HDD和SSD上,Corgipile的Corgipile比两个最先进的IN-DB ML系统(Apache Madlib和Bismarck)快1.6 x-12.8倍。
translated by 谷歌翻译
变压器在计算机视觉中的成功吸引了医学成像社区越来越多的关注。特别是对于医学图像细分,已经介绍了许多基于卷积神经网络(CNN)和变压器的出色混合体系结构,并取得了令人印象深刻的性能。但是,将模块化变压器嵌入CNN中的大多数方法都难以发挥其全部潜力。在本文中,我们提出了一种新型的医学图像分割的混合体系结构,称为Phtrans,该架构可与主要构建基块中的变形金刚和CNN杂交,以产生来自全球和本地特征的层次结构表示,并适应性地汇总它们,旨在完全利用其优势以获得更好的优势。细分性能。具体而言,phtrans遵循U形编码器编码器设计,并在深层阶段引入平行的Hybird模块,其中卷积块和经过修改的3D SWIN变压器分别学习本地特征和全局依赖性,然后统一尺寸,统一尺寸输出以实现特征聚合。超出颅库和自动化心脏诊断挑战数据集以外的多ATLA标签的广泛实验结果证实了其有效性,始终超过了最先进的方法。该代码可在以下网址获得:https://github.com/lseventeen/phtrans。
translated by 谷歌翻译
K-Core Deconnosition是一个常用的指标来分析图形结构或研究节点在复杂图中的相对重要性。近年来,图表的规模迅速增长,特别是在工业环境中。例如,我们的工业伙伴以数十亿用户运行流行的社交应用程序,并且能够收集丰富的用户数据。因此,对大型图形的k核分解应用于学术界和行业的越来越多的关注。处理大图的简单但有效的方法是在分布式设置中训练它们,并且还提出了一些分布式k核分解算法。尽管他们有效性,我们在实验和理论上观察到这些算法消耗了太多资源,并在超大型图表上变得不稳定,特别是当给定的资源有限时。在本文中,我们处理那些超大型图形,并在分布式K核分解算法的顶部提出了分行和征服策略。我们在三个大图中评估我们的方法。实验结果表明,资源的消耗可以显着降低,大规模图的计算比现有方法更稳定。例如,分布式K-Core分解算法可以缩放到具有1360亿边缘的大图,而不会与我们的分行和征服技术丢失正确性。
translated by 谷歌翻译